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ABSTRACT 

We present and describe a catalog of galaxy photometric redshifts (photo-z's) for the Sloan 
Digital Sky Survey (SDSS) Coadd Data. We use the Artificial Neural Network (ANN) technique 
to calculate photo-z's and the Nearest Neighbor Error (NNE) method to estimate photo-z errors 
for ~ 13 million objects classified as galaxies in the coadd with r < 24.5. The photo-z and 
photo-z error estimators are trained and validated on a sample of ~ 83, 000 galaxies that have 
SDSS photometry and spectroscopic redshifts measured by the SDSS Data Release 7 (DR7), 
the Canadian Network for Observational Cosmology Field Galaxy Survey (CNOC2), the Deep 
Extragalactic Evolutionary Probe Data Release 3(DEEP2 DR3), the Visible imaging Multi- 
Object Spectrograph - Very Large Telescope Deep Survey (VVDS) and the WiggleZ Dark Energy 
Survey. For the best ANN methods we have tried, we find that 68% of the galaxies in the 
validation set have a photo-z error smaller than ogs = 0.031. After presenting our results and 
quality tests, we provide a short guide for users accessing the public data. 

Subject headings: photometric redshifts sdss - Sloan Digital Sky Survey 



1. Introduction 

In recent years, digital sky surveys obtained 
multi-band imaging for of order a hundred mil- 
lion galaxies, however we have spectroscopic red- 
shifts available for only over one million galax- 
ies. Deep, wide-area surveys planned for the next 
decades will increase the number of galaxies with 
multi-band photometry to a few billion and we will 
only be able to obtain spectroscopic redshifts for 
a small fraction of these objects, due to techno- 
logical and financial limitations. As a result, sub- 
stantial effort has been going into developing pho- 
tometric redshift (photo-z) techniques, which use 
multi-band photometry to estimate approximate 
galaxy redshifts. For many applications in extra- 
galactic astronomy and cosmology, the resulting 



photometric redshift precision is sufficient for the 
science goals at hand, provided one can accurately 
characterize the uncertainties in the photo-z esti- 
mates. 

Two broad categories of photo-z estimators are 
in wide use: template-fitting and training set 
methods. In template-fitting, one assigns a red- 
shift to a galaxy by finding the redshifted spec- 
tral energy distribution (SED), selected from a li- 
brary of templates, that best reproduces the ob- 
served fluxes in the broadband filters. By con- 
trast, in the training set approach, one uses a 
training set of galaxies with spectroscopic redshifts 
and photometry to derive an empirical relation be- 
tween photometric observables (e.g., magnitudes, 
colors, and morphological indicators) and redshift. 
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Examples of empirical methods include Polyno- 
mial Fitting (Connolly ct al. 1995), the Near- 
est Neighbor method (Csabai ct al. 2003), the 
Nearest Neighbor Polynomial (NNP) technique 
(Oyaizu et al. 2008a), Artificial Neural Networks 
(ANN) (Collister & Lahav 2004; Vanzella et al. 
2004; d'Abrusco et al. 2007), and Support Vec- 
tor Machines (Wadadekar 2005). When a large 
spectroscopic training set that is representative of 
the photometric data set to be analyzed is avail- 
able, training set techniques typically outperform 
templatc-htting methods, in the sense that the 
photo-z estimates have smaller scatter and bias 
with respect to the true redshifts (Oyaizu et al. 
2008a). On the other hand, template- fitting can 
be applied to a photometric sample for which rel- 
atively few spectroscopic analogs exist. For a 
comprehensive review and comparison of photo-z 
methods, see Oyaizu et al. (2008a). 

In this paper, we present a publicly available 
galaxy photometric redshift catalog for the coadd 
data which is part of the Seventh Data Release 
(DR7) of the Sloan Digital Sky Survey (SDSS) 
imaging catalog (Blanton et al. 2003; Eisenstein 
et al. 2001; Gunn et al. 1998; Ivezic et al. 2004; 
Strauss ct al. 2002; York et al. 2000; Abazajian 
et al. 2009). We use the ANN photo-z method, 
which has proved to be a superior training set 
method (Oyaizu et al. 2008a), and briefly com- 
pare the results using different photometric ob- 
servables. Since the SDSS photometric catalog 
covers a large area of sky, a number of deep spec- 
troscopic galaxy samples with SDSS photometry 
are available to use as training sets, as shown in 
Fig. 1. 

2. SDSS Photometric Catalog and Galaxy 
Selection 

The SDSS comprises a large-area imaging sur- 
vey of the north Galactic cap, a multi-epoch 
imaging survey of an equatorial stripe in the 
south Galactic cap, and a spectroscopic survey 
of roughly 10 6 galaxies and 10 5 quasars (York 
et al. 2000). The survey uses a dedicated, wide- 
field, 2.5m telescope (Gunn et al. 1998) at Apache 
Point Observatory, New Mexico. Imaging is car- 
ried out in drift-scan mode using a 142 mega-pixel 
camera (Gunn ct al. 2006) that gathers data in 
five broad bands, ugriz, spanning the range from 



3,000 to 10,000 A (Fukugita et al. 1996), with an 
effective exposure time of 54.1 seconds per band. 
The images are processed using specialized soft- 
ware (Lupton et al. 2001; Stoughton et al. 2002) 
and are astrometrically (Pier et al. 2003) and pho- 
tometrically (Hogg et al. 2001; Tucker et al. 2006) 
calibrated using observations of a set of primary 
standard stars (Smith et al. 2002) observed on a 
neighboring 20-inch telescope. 

The seventh SDSS Data Release (DR7) imag- 
ing footprint increased ~ 22% when compared to 
the previous data release (DR6) which covers an 
essentially contiguous region of the north Galactic 
cap. The additional coverage includes the small 
missing patches in the contiguous region of the 
north galactic cap, and the stripes which are part 
of the SEGUE (Sloan Extension for Galactic Un- 
derstanding and Exploration) survey. In any re- 
gion where imaging runs overlap, one run is de- 
clared primary 1 and is used for spectroscopic tar- 
get selection; other runs are declared secondary. 
The area covered by the DR7 primary imaging sur- 
vey, including the southern stripes, is 11, 663 deg 2 
(Abazajian et al. 2009). 

The SDSS stripe along the celestial equator 
in the Southern Galactic Cap ("Stripe 82") was 
imaged multiple times in the Fall months. This 
was first carried out to allow a co-addition of the 
the repeat imaging scans in order to reach fainter 
magnitudes, roughly 2 mag fainter than the single 
SDSS scans (see Table 1). The co-addition in- 
cludes a total of 122 runs, covering any given piece 
of the ~ 250 deg 2 area between 20 and 40 times. 
The co-addition runs are designated 106 and 
206 under the Stripe82 database in the Catalog 
Archive Server (CAS) (see the SDSS CasJobs web- 
sitehttp : / / casjobs . sdss . org/ casjobs/). The 
reader can find a detailed description of the co- 
addition in Annis et al. (2011). 

The SDSS database provides a variety of 
measured magnitudes for each detected object. 
Throughout this paper, we use dereddened model 
magnitudes to perform the photometric redshift 
computations. To determine the model magni- 
tude, the SDSS photometric pipeline fits two mod- 
els to the image of each galaxy in each passband: 
a de Vaucouleurs (early-type) and an exponential 



For the precise definition of primary objects see 
http: //cas . sdss . org/dr7/en/help/docs/glossary . asp#P 
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Fig. 1. — Normalized r magnitude distributions 
for various catalogs. Top three rows: the dis- 
tributions of the spectroscopic catalogs used for 
photo-z training and validation are shown for 
CNOC2, DEEP2, VVDS, WiggleZ and SDSS 
DR7. Entries denotes the number of unique 
galaxy measurements used from each catalog. 
Bottom left: distribution for the whole spectro- 
scopic sample. Bottom right: the distribution 
for the SDSS coadd galaxy sample, where objects 
were classified as galaxies according to the photo- 
metric TYPE flag (see text). 



(late-type) light profile. The models are convolved 
with the estimated point spread function (PSF), 
with arbitrary axis ratio and position angle. The 
best-fit model in the r band (which is used to 
fix the model scale radius) is then applied to the 
other passbands and convolved with the passband- 
dependent PSFs to yield the model magnitudes. 
Model magnitudes provide an unbiased color esti- 
mate in the absence of color gradients (Stoughton 
et al. 2002), and the dereddening procedure re- 
moves the effect of Galactic extinction (Schlegel 
et al. 1998). 

To construct the photometric sample of galax- 
ies for which we wish to estimate photo-z's, we 
obtained a catalog drawn from the SDSS CasJobs 
website. We checked some of the SDSS photo- 
metric flags to ensure that we have obtained a 
reasonably clean galaxy sample. In particular, 
we selected all primary objects from Stripe82 
that have the TYPE flag equal to 3 (the type 
for galaxy) and that do not have any of the flags 
BRIGHT, SATURATED, or SATUR_CENTER 
set. For the definitions of these flags we refer 
the reader to the PHOTO flags entry at the SDSS 
website 2 or to Appendix A. We also took into ac- 
count the nominal SDSS coadd flux limit by only 
selecting galaxies with dereddened model magni- 
tude r < 24.5. In addition, the co-addition does 
not propagate information on saturated pixels in 
individuals runs, and therefore the photometry of 
objects brighter than r = 15.5 is suspect. To cir- 
cumvent this issue we selected only galaxies with 
r > 16. The full database query we used is given 
in Appendix A. 

The final photometric sample comprises 13, 688, 828 
galaxies. Only 2, 267 objects are in DR6 photo- 
metric redshift catalog from Oyaizu et al. (2008a). 
The r magnitude distribution of this sample is 
shown in the bottom right panel of Fig. 1; the 
g — r and r — i color distributions are shown in the 
bottom panels of Fig. 2. 

3. Spectroscopic Training and Validation 

sets 



Since our methods to estimate photo-z's and 
photo-z errors are training-set based, we would 
ideally like the spectroscopic training set to be 



2 http : //cas . sdss . org/dr7/en/help/browser /browser . asp 
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Table 1 
SDSS Coadd Properties 



AB 


magnitude limits 


u 


23.25 


9 


23.51 


r 


23.26 


i 


22.69 


z 


21.27 



Note. — Magnitude limits 
are for 50% completeness for 
galaxies in typical seeing (An- 
nis et al. 2011). The median 
seeing for the SDSS imaging 
survey is 1.4". 



Spectroscopic Sample] 


Spec Colors 


1 






Entrie 


s 82741 


0.8 












o 












oTo.4 












0.2 

q 


: 1 










1 -0.5 


0.5 1 

g-r 


1.5 


2 2.5 



Spectroscopic Sample I 



1 

0.8 

^0.4 
0.2 




Spec Colors 2 
pT Entries 82741 




2-1.5-1 -0.5 0.5 1 1.5 2 

r-i 




-1 -0.5 0.5 1 1.5 2 2.5 

g-r 



2-1.5-1 -0.5 0.5 1 1.5 2 

r-i 



Fig. 2. — Normalized distribution of g — r and 
r — i colors. Top row: the color distributions for 
galaxies in the full spectroscopic sample. Bottom 
row: the color distributions for galaxies in the pho- 
tometric sample. As above, galaxy classification 
used the photometric TYPE flag. 



fully representative of the photometric sample 
to be analyzed, i.e., to have similar statistical 
properties and magnitude/redshift distributions. 
Training-set methods can be thought of as inher- 
ently Bayesian, in the sense that the training-set 
distributions form effective priors for the analy- 
sis of the photometric sample; to the extent that 
the training-set distributions reflect those of the 
photometric sample, we may expect the photo- 
z estimates to be unbiased (or at least they will 
not be biased by the prior). Given the practical 
difficulties of carrying out spectroscopy at faint 
magnitudes and low surface brightness, such an 
ideal generally cannot be achieved. Realistically, 
all we can hope for is a training set that (a) is large 
enough that statistical fluctuations are small and 
(b) spans the same magnitude, color, and redshift 
ranges as the photometric sample (Oyaizu et al. 
2008a). 

We have constructed a spectroscopic sample 
consisting of 82, 741 galaxies that have SDSS 
coadd photometry measurements and that have 
spectroscopic redshifts measured by the SDSS or 
by other surveys, as described below. We im- 
posed a magnitude limit of 16 < r < 24.5 on the 
spectroscopic sample and applied additional cuts 
on the quality of the spectroscopic redshifts re- 
ported by the different surveys. Each survey pro- 
viding spectroscopic redshifts defines a redshift 
quality indicator; we refer the reader to the re- 
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spectivc publications listed below for their precise 
definitions. For each survey, we chose a redshift 
quality cut roughly corresponding to 90% redshift 
confidence or greater. The SDSS spectroscopic 
sample provides 57,020 redshifts with confidence 
level z C onf > 0.9. The remaining redshifts are: 
f , 355 from the Canadian Network for Observa- 
tional Cosmology Field Galaxy Survey (CNOC2; 
Yee et al. 2000), 9,955 from the Deep Extragalac- 
tic Evolutionary Probe (DEEP2) (Wciner et al. 
2005) 3 with z qua iity > 3, 8, 702 from the WiggleZ 
Dark Energy Survey (Drinkwater et al. 20f 0) with 
QoP > 3, 5, 709 from the Visible imaging Multi- 
Object Spectrograph - Very Large Telescope Deep 
Survey (VVDS) (GariUi et al. 2008) with flag 3 
and 4. 

The spectroscopic sample obtained by combin- 
ing all these catalogs was divided into two catalogs 
of the same size (~ 42,000 objects each). One of 
these catalogs was taken to be the training set used 
by the photo-z and error estimators, and the other 
was used MS i\ validation set to carry out tests of 
photo-z quality (see §4.1). 

The r-magnitudc distributions for each spectro- 
scopic sample are shown in Fig. 1, while Fig. 2 
shows the color (<? — r and r — i) distributions for 
all objects in the final spectroscopic sample. As 
for how representative the spectroscopic training 
and validation sample are for the full photomet- 
ric sample, we checked that the color/magnitude 
space is fully covered by the spectroscopic sam- 
ple up to redshift 0.75 - 0.8. Beyond this redshift 
range, the spectroscopic sample partially cover 
the color/magnitude space. Therefore, the reader 
need to be cautious when using photo-z's beyond 
this range. 

4. Methods 

4.1. ANN Photometric redshifts 

The ANN method that we use to estimate 
galaxy photo-z's is a general classification and in- 
terpolation tool used successfully in a variety of 
fields. We use a particular type of ANN called a 
Feed Forward Multilayer Perceptron to map the 
relationship between photometric observables and 
redshifts, as implemented in Oyaizu et al. (2008a). 

In this work we use X:15:15:15:l networks to 



3 http : //deep . berkeley . edu/DR2/ 



estimate photo-z's, where X is the number of in- 
put photometric parameters per galaxy, following 
the notation of Collistcr & Lahav (2004). The 
corresponding number of degrees of freedom (the 
number of weights) is roughly 1,000, depending on 
the actual value of X. 

Following Oyaizu et al. (2008a), in order to 
avoid over-fitting, the spectroscopic sample is di- 
vided into two independent subsets, the training 
and validation sets, and the formal minimizations 
are done using the training set. After each mini- 
mization step, the network is evaluated on the val- 
idation set, and the set of weights that performs 
best on the validation set is chosen as the final 
set. To reduce the chance of ending in a less-than- 
optimal local minimum, we minimize five net- 
works starting at different positions in the space of 
weights. Among these, we choose the network that 
gives the lowest photo-z scatter in the validation 
set. 

We calculated photo-z's using galaxy magni- 
tudes, colors, and the concentration indices for all 
passbands. The concentration index Cj in a pass- 
band i is defined as the ratio of PetroR50 and 
PetroR90, which are the radii that encircle 50% 
and 90% of the Petrosian flux, respectively. Early- 
type (E and SO) galaxies, with centrally peaked 
surface brightness profiles, tend to have low val- 
ues of the concentration index, while late-type spi- 
rals, with quasi-exponential light profiles, typically 
have higher values of c. Previous studies (Morgan 
1958; Shimasaku et al. 2001; Yamauchi et al. 2005; 
Park & Choi 2005) have shown that the concen- 
tration parameter correlates well with galaxy mor- 
phological type, and we used it to help break the 
degeneracy between redshift and galaxy type. We 
present the photo-z results for different combina- 
tions of input parameters in §5. 

4.2. Photometric redshift errors 

We estimated photo-z errors for objects in the 
photometric catalog using the Nearest Neighbor 
Error (NNE) estimator (Oyaizu et al. 2008b), pub- 
licly available. 4 The NNE method is training-set 
based, with a neighbor selection similar to the 
NNP photo-z estimator; it associates photo-z er- 
rors to photometric objects by considering the er- 
rors for objects with similar multi-band magni- 



4 http: //kobayashi .physics . lsa.umich.edu/~ccunha/nearest/ 
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tudcs in the validation set. We use the validation 
set, because the photo-z's of the training set could 
be over-fit, which would result in NNE underesti- 
mating the photo-z errors. In studies of photo-z 
error estimators applied to mock and real galaxy 
catalogs, Oyaizu et al. (2008b) found that NNE ac- 
curately predicts the photo-z error when the train- 
ing set is representative of the photometric sam- 
ple. In the following, a^ NE will denote the nearest 
neighbors error estimate. 

5. Results 

To test the quality of the photo-z estimates, we 
use the photo-z bias z^ ias and the photo-z RMS 
scatter, er, defined by 



Zbias 



1 N 

N ^ 



'phot.i 



°spec,i ) 



l N 

a 2 = —^2(zp ho t,t 

i=l 



(1) 



(2) 



and 068, the range containing 68% of the val- 
idation set objects in the distribution of Sz = 
z p hot,i — Zspec^i. In other words, a^s is the value of 
\z p hot,i — z sp ec,i\ such that 68% of the objects have 
\zphot.i - z spec ^\ < er 68 . Naturally, if the prob- 
ability distribution function P(5z) is Gaussian a 
and aea coincide. We also consider 095, defined in 
analogous way. 

We computed photo-z's using the ANN method 
with different combinations of input photometric 
observablcs. All tested combinations are listed in 
Table 2. In case M, we use the five magnitudes 
ugriz. In case C, we use the four colors u — g, 
g — r, r — i and i — z. In case CC, we use the four 
colors with the concentration indices c u c g c r CiC z . 
We also repeat the cases M, C and CC splitting 
the training set and the photometric sample into 
4 bins of r magnitude, r < 18, 18 < r < 20, 20 < 
r < 22, 22 < r < 24.5, and perform separate ANN 
fits in each bin. These cases are dubbed Msplit, 
Csplit and CCsplit, respectively. For all cases we 
use the same network configuration, described in 
Section 4.1. 

In Fig. 3 we plot the photometric redshift, 
Zphot, for 10,000 randomly selected objects from 
validation set vs. true spectroscopic redshift, 
z spec , for all considered cases. In each panel, the 



solid line traces z v ] lot = z spec and the dashed and 
dotted lines show the corresponding 68% and 95% 
regions (<768 and 095), respectively, defined in z spec 
bins. We find that all cases produce very similar 
results, in agreement with Oyaizu et al. (2008a). 

Table 3 shows a summary of the performance 
results of the different ANN cases. The standard 
deviation in this values, estimated from the five 
networks mentioned in Section 4.1, is 0.001. We 
also show in Figure 4 the performance indicators 
a and a as functions of r magnitude for all cases. 
We see that the photo-z scatter increases consid- 
erably for r > 22. This effect can be explained 
by the small number of objects in the training set 
covering this regime (see Figure 1). In addition, 
we show in Figs. 5 and 7 Zf,i as -, o and ags as func- 
tions of estimated photo-z and, in Figs. 6 and 8, 
the same indicators as functions of the the spec- 
troscopic redshift. We can see that the values of 
these indicators increase for z p hot > 0.75 regard- 
less the case considered. We show in Table 4 an- 
other important indicator, the fraction of catas- 
trophic results, here defined as the number of ob- 
jects for which we get \z p hot — z spec \ > 0.1 divided 
by the total number in the sample. This defini- 
tion corresponds to ~ 12 % of the distribution of 
\zphot ~ z spec I for this sample. Based on theses 
results we choose Msplit as the best case. Specifi- 
cally, Msplit has overall smaller ogs as a function 
of magnitude (Figure 4) and a better fraction of 
catastrophic results (Table 4). 

In Fig. 9 we plot the colors u—g, g — r, r~i and 
i — z versus spectroscopic redshift bright (r < 22) 
and faint (r > 22) galaxies in the validation set. 
We see that, for faint galaxies, colors and spectro- 
scopic redshit are barely correlated. Such degen- 
eracy explains the low efficiency of the method in 
this magnitude regime. 

In Fig. 10 we plot the normalized error distribu- 
tion, i.e., the distribution of [z p hot — z spe c) / &^ NE , 
for objects in the spectroscopic sample, using the 
Msplit case, in r magnitude slices, without any 
bias correction. The solid lines show Gaussian 
distributions with zero mean and unit variance. 
These plots indicate that, on average, the photo-z 
estimates arc nearly unbiased and the NNE error 
is a good estimate of the true error, although we 
can see some asymmetry in the distribution de- 
pending on the magnitude range. 

In Fig. 11 we show the distribution of the 
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estimated photometric redshift, corrected for the 
bias, Zphot — Zbias for the photometric sample, in 
r magnitude bins, for our best case (Msplit). The 
bias was estimated from the validation sample in 
photo-z bins with width 0.04 as in Fig. 5. The 
bias correction is included in the final catalog. 

For a significant fraction of the photometric 
sample the nearest neighbors error estimate is 
large (greater than 10% of photo-z value) and for 
most of the science cases it will be necessary to 
cut the catalog. We show in Fig. 12, the photo-z 
distributions for the whole sample (as in Fig. 11) 
and for objects with a^ NE < 0.1. We also show in 
Fig. 13 the photometric redshift, z p hot, for 10,000 
randomly selected objects from validation set vs. 
true spectroscopic redshift, z spec for the same low 
error subsample. 

We found that the use of concentration param- 
eters does not improve significantly the result, in 
contrast to our initial expectation, based on the 
SDSS DR6 results (Oyaizu et al. 2008b). O'Mill 
et al. (2011) also found that these parameters im- 
prove the results for the SDSS DR7 main data. 
This is related to the error in the measured mo- 
ments for higher magnitudes, which is specially 
important for this sample, consequently the ad- 
ditional noise roughly compensates the additional 
information from these parameters. Similar con- 
clusions can be found in (Singalet al. 2011), al- 
though their definition of concentration is not the 
same used here. 



6. Accessing the Catalog 

The best case bias corrected photo-z catalog 
(Msplit) is publicly available as a SDSS value- 
added catalog at http : //www. sdss . org/dr7/products/value_ 
added/ index . html. 

7. Conclusions 

We have presented a public catalog of photo- 
metric redshifts for the SDSS coadd photometric 
sample using photo-z estimates, based on the ANN 
method, considering the five magnitudes ugriz as 
input parameters and also performing the train- 
ing in r magnitude bins separately (Msplit). Our 
tests indicate that the photo-z estimates are most 
reliable for galaxies with r < 22 and that the scat- 
ter increases significantly at fainter magnitudes. 
Based on our results, we advise the reader to use 
carefully this catalog for z p hot > 0.75, since all 
performance indicators show a lower efficiency of 
the method, with the chosen spectroscopic sample, 
at this redshift range. However, depending on the 
specific science goals, a simple quality cut on the 
photo-z error might be sufficient to compensate 
this problem at the desired level. 

Funding for the Sloan Digital Sky Survey 
(SDSS) and SDSS-II has been provided by the Al- 
fred P. Sloan Foundation, the Participating Insti- 
tutions, the National Science Foundation, the U.S. 
Department of Energy the National Aeronautics 
and Space Administration, the Japanese Monbuk- 
agakusho, and the Max Planck Society, and the 
Higher Education Funding Council for England. 
The SDSS Web site is http://www.sdss.org/. 

The SDSS is managed by the Astrophysical 
Research Consortium (ARC) for the Participat- 
ing Institutions. The Participating Institutions 
are the American Museum of Natural History, 
Astrophysical Institute Potsdam, University of 
Basel, University of Cambridge, Case Western 
Reserve University, The University of Chicago, 
Drexel University, Fermilab, the Institute for Ad- 
vanced Study, the Japan Participation Group, 
The Johns Hopkins University, the Joint Institute 
for Nuclear Astrophysics, the Kavli Institute for 
Particle Astrophysics and Cosmology, the Korean 
Scientist Group, the Chinese Academy of Sciences 
(LAMOST), Los Alamos National Laboratory, the 
Max-Planck-Institute for Astronomy (MPIA), the 



7 




Fig. 3. — Zphot versus z spe c for the validation set for different spectroscopic sets and different choices of 
photometric observablcs. Top Left: Case C, where the input photometric data comprise the 4 colors (u — g, 
g — r, r — i, i — z) Top Middle: Case CC, where the input data are the 4 colors u — g, g — r, r — i, i — z, and 
5 concentration parameters c u c g c r CiC z . Top Right: Case M, where we use only magnitudes. Bottom Left: 
Case Csplit, where we split the sample in r magnitude slices. Bottom Middle: Case CCsplit, where we split 
the sample in r magnitude slices. Bottom Right: Case Msplit, where we split the sample in r magnitude 
slices. The solid line in each panel indicates z p hot = z spec ; the dashed and dotted lines show the 68% and 
95% confidence regions as a function of 2J sp ec (c68 and 095)1 respectively. The points display results for a 
random 10,000 objects subset of the validation set. 



Table 2 

Description of the different combinations 



Case 






Inputs /Description 


C 


u-g, 9- 


- r. 


r — i, i — z 


Csplit 


u -9, 9- 


- r, 


r — i, i — z, split in r slices 


M 


u, g, r, i, 


z 




Msplit 


u, g, r, i, 


z, 


split in r slices 


CC 


u - .g, g - 


- r. 


7* % , l Z "T C u , Cg , C r , , C z 


CCsplit 


u -9,9- 


- '■• 


r — i, i — z + c Ul c g ,c r ,Ci,c z , split in r slices 
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Fig. 4. — a and aes as functions of r magnitude for all tested cases. 
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Fig. 7. — a and ogs as a function of the photometric redshift for all tested cases. 
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Fig. 8. — cr and aes as a function of the spectroscopic redshift for all tested cases. 

Table 3 
Summary of ANN cases 



Case 


a 


068 


C 


0.16 


0.046 


Csplit 


0.14 


0.034 


M 


0.14 


0.034 


Msplit 


0.14 


0.031 


CC 


0.15 


0.043 


CCsplit 


0.14 


0.032 



Note. — a and a 6S for 
the validation set using 
different input param- 
eters (magnitudes, col- 
ors, and concentration in- 
dices) and training pro- 
cedures (training with 
the whole sample or in 
magnitude bins indepen- 
dently). 
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Table 4 
Catastrophic redshifts 



Range 

Case 


r < 18 


18 < r < 19 


19 < r < 20 


20 < r < 21 


21< r < 22 


22 < r < 23 


r > 23 


all 


C 


0.020 


0.034 


0.048 


0.092 


0.14 


0.22 


0.17 


0.075 


Csplit 


0.0013 


0.0063 


0.0058 


0.093 


0.084 


0.28 


0.29 


0.062 


M 


0.0012 


0.0034 


0.012 


0.054 


0.10 


0.26 


0.26 


0.058 


Msplit 


0.0012 


0.0042 


0.0068 


0.059 


0.11 


0.25 


0.24 


0.055 


CC 


0.013 


0.022 


0.030 


0.066 


0.13 


0.25 


0.21 


0.069 


CCsplit 


0.0012 


0.0053 


0.0056 


0.089 


0.083 


0.28 


0.28 


0.060 



Note. — Fraction of objects {N ca t/N to tai) with \z p hot — z s P ec\ > 0.1 for the validation set using different input 
parameters (colors, concentration indices and magnitudes) and training procedures. 
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Fig. 9. — Colors vs spectroscopic redshift for 
galaxies in the validation set. Red squares (blue 
circles) denote galaxies with r < 22 (r > 22). The 
curves are the predicted color-rcdshift relations for 
different types of galaxies (E,Sbc,Im) obtained by 
redshifting the k-corrected SEDs of Assef et al. 
(2010) and applying the appropriate filters. 
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Fig. 10. — Distributions of (z p ho 



NNE 



for objects in the spectroscopic sample, in r mag- 
nitude slices, for the Msplit case. 
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Fig. 11. — Photometric redshift distributions, cor- 
rected for bias, in r magnitude slice for the case 
Msplit. 
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Fig. 13. — Photo-z vs. spectroscopic redshift for 
Msplit case. Left: Full sample as in Fig. 3. Right: 
Only objects with a? NE < 0.1. 
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Fig. 12. — Photometric redshift distributions, for 
the case Msplit. Left: All objects. Right: Objects 
with a* NE < 0.1. 
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A. Data Query Code 



Here we provide the SDSS database query used to obtain the catalog containing the photometric sample 
used in this paper. Notice that the query requires the TYPE flag to be set to 3 (galaxies) and selects objects 
with dereddened model magnitude 16 < r < 24.5, which do not have any of the following flags: BRIGHT, 
SATURATED and SATUR_CENTER. The full query is shown below 



SELECT DbjID.ra.dec, 

dered_u , dered_g , dered_r , dered_i , dered_z , 

petroR50_u/petroR90_u as c_u,petroR50_g/petroR90_g as c_g, 
petroR50_r/petroR90_r as c_r, 

petroR50_i/petroR90_i as c_i ,petroR50_z/petroR90_z as c_z, 

err_u , err_g , err_r , err_i , err_z 
INTO coadd_mags_allinone 
FROM Stripe82. . PhotoObjAll 
WHERE (flags_r & 0x0000080000040002) =0 

AND type=3 

AND mode=l 

AND (run=106 or run=206) 

AND dered_r BETWEEN 16 AND 24.5 

We made an additional cut in order to select only objects which have positive values for petroR50/petroR90. 
The final catalog has 13,688,828 galaxies. 

Here we provide a brief description of the flags used in the query: BRIGHT indicates that an object is 
a duplicate detection of an object with signal to noise greater than 200<r; SATURATED indicates that an 
object contains one or more saturated pixels; SATUR_CENTER indicates that the object center is close to at 
least one saturated pixel. Note that in selecting PRIMARY objects (using PhotoPrimary), we have implicitly 
selected objects that either do not have the BLENDED flag set or else have NODEBLEND set or nchild equal 
zero. In addition, the PRIMARY catalog contains no BRIGHT objects, so the cut on BRIGHT objects in 
the query above is in fact redundant. BLENDED objects have multiple peaks detected within them, which 
PHOTO attempts to deblend into several CHILD objects. NODEBLEND objects are BLENDED but no 
deblending was attempted on them, because they are either too close to an EDGE, or too large, or one of 
their children overlaps an edge. A few percent of the objects in our photometric sample have NODEBLEND 
set; some users may wish to remove them. 

We also suggest that users require objects to have the BINNED1 flag set. BINNED1 objects were detected 
at > 5cr significance in the original imaging frame. 

The SDSS webpage 5 provides further recommendations about flags, which we strongly recommend that 
users read. 



5 http: //cas . sdss . org/dr7/en/help/docs/algorithm. asp?search=f lags&submitl=Search 
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